Goto

Collaborating Authors

 Daejeon


Rethinking Forward Processes for Score-Based Data Assimilation in High Dimensions

Yoon, Eunbi, Kim, Donghan, Kim, Dae Wook

arXiv.org Machine Learning

Data assimilation is the process of estimating the time-evolving state of a dynamical system by integrating model predictions and noisy observations. It is commonly formulated as Bayesian filtering, but classical filters often struggle with accuracy or computational feasibility in high dimensions. Recently, score-based generative models have emerged as a scalable approach for high-dimensional data assimilation, enabling accurate modeling and sampling of complex distributions. However, existing score-based filters often specify the forward process independently of the data assimilation. As a result, the measurement-update step depends on heuristic approximations of the likelihood score, which can accumulate errors and degrade performance over time. Here, we propose a measurement-aware score-based filter (MASF) that defines a measurement-aware forward process directly from the measurement equation. This construction makes the likelihood score analytically tractable: for linear measurements, we derive the exact likelihood score and combine it with a learned prior score to obtain the posterior score. Numerical experiments covering a range of settings, including high-dimensional datasets, demonstrate improved accuracy and stability over existing score-based filters.



18d3a2f3068d6c669dcae19ceca1bc24-Paper-Conference.pdf

Neural Information Processing Systems

Thebrain prepares forlearning evenbefore interacting withtheenvironment, by refining and optimizing its structures through spontaneous neural activity that resembles random noise. However,the mechanism of such aprocess has yet to be understood, and it is unclear whether this process can benefit the algorithm of machine learning.


DiversityMattersWhenLearningFromEnsembles

Neural Information Processing Systems

Whilesomerecent works propose to distill an ensemble model into a single model to reduce such costs,thereisstillaperformance gapbetween theensemble anddistilledmodels.





Copycats

Neural Information Processing Systems

In the past, MI datasets were frequently proprietary, confined to particular institutions, and stored in private repositories. In this particular setting, there is a pressing need for alternative models of data sharing, documentation, and governance. Within this context,theemergence ofCommunityContributed Platforms (CCPs) presented a potential for the public sharing of medical datasets.